Interactive Word Alignment for Corpus Linguistics
نویسندگان
چکیده
In this paper an incremental method and an interactive tool to improve the performance of word alignment are presented. The most important factor in our proposal is to put a human in the alignment loop. This is achieved by using interactive word alignment, i.e., a word alignment system and a human that are collaborating in order to make word alignment as efficient and accurate as possible. The aim is full coverage alignments with high accuracy, as the quality of word alignments is crucial in many applications within corpus linguistics, for example, in lexicography, contrastive linguistics and translation studies.
منابع مشابه
Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)
This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...
متن کاملMultilingwis – Explore Your Parallel Corpus
We present Multilingwis2, a web based search engine for exploration of word-aligned parallel and multiparallel corpora. Our application extends the search facilities by Clematide et al. (2016) and is designed to be easily employable on any parallel corpus comprising universal part-of-speech tags, lemmas and word alignments. In addition to corpus exploration, it has proven useful for the assessm...
متن کاملTopic Models + Word Alignment = A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus
We propose a flexible and effective framework for extracting a bilingual dictionary from comparable corpora. Our approach is based on a novel combination of topic modeling and word alignment techniques. Intuitively, our approach works by converting a comparable document-aligned corpus into a parallel topic-aligned corpus, then learning word alignments using co-occurrence statistics. This topica...
متن کاملBuilding Monolingual Word Alignment Corpus for the Greater China Region
For a single semantic meaning, various linguistic expressions exist the Mainland China, Hong Kong and Taiwan variety of Mandarin Chinese, a.k.a., the Greater China Region (GCR). Differing from the current bilingual word alignment corpus, in this paper, we have constructed two monolingual GCR corpora. One is a 11,623-triple GCR word dictionary corpora which is automatically extracted and manuall...
متن کاملEVBCorpus - A Multi-Layer English-Vietnamese Bilingual Corpus for Studying Tasks in Comparative Linguistics
Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003